Visual Spatial Reasoning

نویسندگان

چکیده

Abstract Spatial relations are a basic part of human cognition. However, they expressed in natural language variety ways, and previous work has suggested that current vision-and-language models (VLMs) struggle to capture relational information. In this paper, we present Visual Reasoning (VSR), dataset containing more than 10k text-image pairs with 66 types spatial English (e.g., under, front of, facing). While using seemingly simple annotation format, show how the includes challenging linguistic phenomena, such as varying reference frames. We demonstrate large gap between model performance: The ceiling is above 95%, while state-of-the-art only achieve around 70%. observe VLMs’ by-relation performances have little correlation number training examples tested general incapable recognising concerning orientations objects.1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Visual and Spatial Representations in Relational Reasoning

Psychologists have argued that visual imagery plays a vital role in human reasoning. If so, then reasoning with materials that are easy to visualize should be better than reasoning with materials that are hard to visualize. The literature, however, reports inconsistent results. Our starting point was that the inconsistencies arise from confounding imageability with the spatial nature of the mat...

متن کامل

Spatial Reasoning: No Need for Visual Information

One of the central questions of spatial reasoning research is whether the underlying processes are inherently visual or spatial. The article reports a dual-task experiment that was conducted to explore the visual and/or spatial nature of human spatial reasoning. The main tasks were inferences based on a spatial version of the interval calculus introduced by Allen (1983). The secondary tasks wer...

متن کامل

Mapping conceptual to spatial relations in visual reasoning.

In 3 experiments, the authors investigated the impact of goals and perceptual relations on graph interpretation when people evaluate functional dependencies between continuous variables. Participants made inferences about the relative rate of 2 continuous linear variables (altitude and temperature). The authors varied the assignments of variables to axes, the perceived cause-effect relation bet...

متن کامل

Spatial structures and visual attention in diagrammatic reasoning

The thesis addresses questions of relating diagrams and mental representations in diagrammatic reasoning through eye movement research. It proposes model-based representations of attention for improved human-computer cooperation. In particular, the thesis proposes in theory and details in practice a computational framework for live capture and analysis of eye movement data in diagrammatic reaso...

متن کامل

Spatial Reasoning as Verbal Reasoning

We introduce an approach for how spatial reasoning can be conceived as verbal reasoning. We describe a theory of how humans construct a mental representation given onedimensional spatial relations. In this construction process objects are inserted in a dynamic structure called a “queue” which provides an implicit direction. The spatial interpretation of this direction can be chosen freely. This...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Transactions of the Association for Computational Linguistics

سال: 2023

ISSN: ['2307-387X']

DOI: https://doi.org/10.1162/tacl_a_00566